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Abstract. We give processor-allocation algorithms for grid architec- 
tures, where the objective is to select processors from a set of available 
processors to minimize the average number of communication hops. 
The associated clustering problem is as follows: Given n points in Sff*, find 
a size-fc subset with minimum average pairwise Li distance. We present a 
natural approximation algorithm and show that it is a ^-approximation 
for 2D grids. In d dimensions, the approximation guarantee is 2 — 
which is tight. We also give a polynomial-time approximation scheme 
(PTAS) for constant dimension d and report on experimental results. 



1 Introduction 

We give processor-allocation algorithms for grid architectures. Our objective 
is to select processors to run a job from a set of available processors so that the 
average number of communication hops between processors assigned to the job 
is minimized. Our problem is restated as follows: given a set P of n points in 
JR'^, find a subset S oi k points with minimum average pairwise Li distance. 
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Motivation: Processor Allocation in Supercomputers. Our algorithmic work is 
motivated by a problem in the operation of supercomputers. Specifically, we 
targeted our algorithms and simulations at Cplant [7,26], a commodity-based 
supercomputer developed at Sandia National Laboratories, and Red Storm, a 
custom supercomputer being developed at Cray, though other supercomputers 
at Sandia have similar features. In these systems, a scheduler selects the next 
job to run based on priority. The allocator then independently places the job on 
a set of processors which exclusively run that job to completion. Security con- 
straints forbid migration, preemption, or multitasking. These constraints make 
the allocation decision more important since it cannot be changed once made. 

To obtain maximum throughput in a network-limited computing system, the 
processors allocated to a single job should be physically near each other. This 
placement reduces communication costs and avoids bandwidth contention caused 
by overlapping jobs. Experiments have shown that allocating nearby processors 
to each job can improve throughput on a range of architectures [3, 18,21,22,24]. 
Several papers suggest that minimizing the average number of communication 
hops is an appropriate metric for job placement [16,21,22]. Experiments with 
a communication test suite demonstrate that this metric correlates with a job's 
completion time [18]. 

Early processor-allocation algorithms allocate only convex sets of processors 
to each job [6,9, 19,30]. For such allocations, each job's communication can 
be routed entirely within processors assigned to that job, so jobs contend only 
with themselves. But requiring convex allocations reduces the achievable system 
utilization to levels unacceptable for a government-audited system [15,27]. 




O Free processor 

9 Allocated processoi 



Fig. 1. Illustration of MC: Shells around processor ^ for a 3 x 1 request. 



Recent work [8,18,20,23,27] allows discontiguous allocation of processors but 
tries to cluster them and minimize contention with previously allocated jobs. 
Mache, Lo, and Windisch [23] propose the MC algorithm for grid architectures: 
For each free processor, algorithm MC evaluates the quality of an allocation 
centered on that processor. It counts the number of free processors within a 
submesh of the requested size centered on the given processor and within "shells" 
of processors around this submesh; see Figure ^ reproduced from [23] . The cost 
of an allocation is the sum of the shell numbers of the allocated processors. MC 
chooses the allocation with lowest cost. Since users at Sandia do not request 
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processors in a particular shape, in this paper, we consider MClxl, a variant in 
which shcU is 1 x 1 and subsequent shells grow in the same way as in MC. 

Originally, processor allocation on the Cplant system was not based on the 
locations of the free processors. The allocator simply verified that enough pro- 
cessors were free before dispatching a job. The current allocator uses space-filling 
curves and ID bin-packing techniques based upon work of Leung et al. [18]. We 
also have Cplant implementations of a 3D version of MClxl and the greedy 
heuristic (called MM) analyzed in this paper. 

Related Algorithmic Work. Krumke et al. [16] consider a generalization of our 
problem on arbitrary topologies for several measures of locality, motivated by 
allocation on the CM5. They prove it is NP-hard to approximate average pairwise 
distance in general, but give a 2-approximation for distances obeying the triangle 
inequality. 

A natural special case of the allocation problem is the unconstrained problem, 
in the absence of occupied processors: For any number A:, find k grid points 
minimizing average pairwise Li distance. For moderate values of k, these sets 
can be found by exhaustive search; see Figure [3 The resulting shapes appear to 
approximate some "ideal" rounded shape, with better and better approximation 
for growing k. Karp et al. [14] and Bender et al. [4] study the exact nature 
of this shape, shown in Figure |31 Surprisingly, there is no known closed-form 
solution for the resulting convex curve, but Bender et al. [4] have expressed it 
as a differential equation. The complexity of this special case remains open, but 
its mathematical difficulty suggests the hardness of obtaining good solutions for 
the general constrained problem. 
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1/1 = 1.000 4/3=1.333 8/6=1.333 16/10=1.600 



38/21 = 1.809 54/28=1.928 72/36=2.000 96/45=2.133 



152/66=2.303 188/78=2.410 227/91=2.494 272/105=2.590 



374/136=2.750 433/153=2.830 496/171=2.900 563/190=2.963 



318/120=2.650 



632/210=3.009 



Fig. 2. Optimal unconstrained clusters for small values of k; numbers shown are the 
average Li distances, with truncated decimal values. 
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Fig. 3. Plot from Bender et al. [4] of a quarter of the optimal limiting boundary curve 
for the unconstrained problem; the dotted line is a circle. 

In reconfigurable computing on field- programmable gate arrays (FPGAs), 
varying processor sizes give rise to a generalization of our problem: place a 
set of rectangular modules on a grid to minimize the overall weighted sum of 
Li distances between modules. Ahmadinia ct al. [1] give an optimal 0{nlogn) 
algorithm for finding an optimal feasible location for a module given a set of 
n existing modules. At this point, no results are known for the general off-line 
problem (place n modules simultaneously) or for on-line versions. 

Another related problem is min-sum k-clustering: separate a graph into k 
clusters to minimize the sum of distances between nodes in the same cluster. For 
general graphs, Sahni and Gonzalez [25] show it is NP-hard to approximate this 
problem to within any constant factor for fc > 3. In a metric space, Guttmann- 
Beck and Hassiu [12] give a 2-approximation, Indyk [13] gives a PTAS for k ~ 2, 
and Bartel et al. [2] give an 0((l/e) log^^' 7i)-approximation for general k. 

Fekcte and Meijer [11] consider the problem of maximizing the average Li 
distance. They give a PTAS for this dispersion problem in Jft"^ for constant d, 
and show that an optimal set of any fixed size can be found in 0{n) time. 

Our Results. We consider algorithms for minimizing the average Li distance 
between allocated processors in a mesh supercomputer. In particular, we give 
the following results: 

— We prove that a greedy algorithm we call MM is a |-approximation algo- 
rithm for 2D grids. This reduces the previous best factor of 2 [16]. We show 
that this analysis is tight. 

— We present a simple generalization of MM to d-dimensional grids and prove 
that it gives a 2 — ^ approximation, which is tight. 

— We give a polynomial-time approximation scheme (PTAS) for points in 
for constant d. 
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— Using simulations, we compare the allocation performance of MM to that of 
other algorithms. As a byproduct, we get insight on how to place a stream 
of jobs in an online setting. 

— We give an algorithm to exactly solve the 2-dimensional case for fc = 3 in 
time O(nlogn). 

— We prove that the d-dimensional version of MClxl has approximation factor 
at most d times that of MM. 

Our work also led to a linear-time dynamic programming algorithm for the 
1-dimensional problem of points on a line or ring; see Leung et al. [5] for details. 

2 Algorithms for Two-Dimensional Point Sets 

2.1 Manhattan Median Algorithm 

Given a set 5 of fc points in the plane, a point that minimizes the total 
Li distance to these points is called an (Li) median. Given the nature of Li 
distances, this is a point whose ^-coordinate (resp. j/-coordinate) is the median 
of the X (resp. y) values of the given point set. Wc can always pick a median 
whose coordinates are from the coordinates in S. There is a unique median if k 
is odd; if k is even, possible median coordinates may form intervals. 

The natural greedy algorithm for our clustering problem is as follows: 



Consider the set / containing the 0{n^) intersection points of the horizontal 
and vertical lines through the points of input P. For each point p G / do: 

1. Take the fc points closest to p (using the Li metric), breaking tics 
arbitrarily. 

2. Compute the total pairwise distance between all fc points. 
Return the set of fc points with smallest total pairwise distance. 



We call this strategy MM, for Manhattan Median. We prove that MM is a |- 
approximation on 2D meshes. (Note that Krumke et al. [16] call a minor variation 
of this algorithm Gen-Alg and show it is a 2-approximation in arbitrary metric 
spaces.) 

For 5 C P, let 151 denote the sum of Li distances between points in S. For 
a point p in the plane, we use Px and Py to denote its x- and y-coordinates 
respectively. 

Lemma 1. MM is not better than a 7/4 approximation. 

Proof. For a class of examples establishing the lower bound, consider the situa- 
tion shown in Figure^ For any e > 0, it has clusters of fc/2 points at (0, 0) and 
(1,0). In addition, it has clusters of fc/8 points at (0, ±(1 — e)), (1,±(1 — e)), 
(2 — e, 0), and (—1 -I- e, 0). The best choices of median are (0, 0) and (1, 0), which 
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yield a total distance of 7fc^(l — 0{e))/16. The optimal solution is the points at 
(0,0) and (1,0), which yield a total distance of □ 



(-1 * e, 0) 



(0, 1 - e) •(!, 1 - e) 



• k/S points 
k/2 points 



(0, 0) ,.' (1,0) (2 - e, 0) 



(0, -1+e) .(l,_l+c) 



Fig. 4. A class of examples where MM yields a ratio of 7/4. 



Now we show that 7/4 is indeed the worst-case bound. We focus on pos- 
sible worst-case arrangements and use local optimality to restrict the possible 
arrangements until the claim follows. 

Let OPT be a subset of P of size k for which | OPT\ is minimum. Without 
loss of generality assume that the origin is a median point of OPT. This means 
that at most k/2 points of OPT have positive x-coordinates (similarly negative 
a;-coordinates, positive y-coordinates, and negative y-coordinates) . Let MM be 
the set of k points closest to the origin. Since this is one candidate solution for 
the algorithm, its sum of pairwise distances is at least as high as that of the 
solution returned by the algorithm. 

Without loss of generality, assume that the largest Li distance of a point 
in MM to the origin is 1, so MM lies in the Li unit circle C. (Note that C is 
diamond-shaped.) We say that points are either inside C, on C, or outside C. 
All points of P inside C are in MM and at least some points on C are in MM. If 
there are more than k points on and inside C, we select all points inside C plus 
those points on C maximizing |MM|. 

Clearly 1 < \MM\/\OPT\. Let pk be the supremum of \MM\/\OPT\ over all 
inputs P. By assuming that ties are broken badly, we can assume that there is 
an input for which \MM\/\OPT\ = pk: 

Lemma 2. For any n and k, there are point sets P with \P\ ~ n for which 
\MM\/\OPT\ attains the value pk. 

Proof. The set of arrangements of n points in the unit circle C is a compact set 
in 2(i-dimensional space. By our assumption on breaking ties, \MM\/\OPT\ is 
upper semicontinuous, so it attains a maximum. □ 

We show that |MM| is at most 7/4 times larger than \ OPT\. 
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Theorem 1. MM is a 7 / A- approximation algorithm for minimizing the sum of 
pairwise Li distances in a 2D mesh. 

Proof. For ease of presentation, we assume without loss of generality that P = 
MMU OPT. Let B = OPTn MM, O = OPT - B and A = MM- B. 
Claim 0: No point p € O lies outside C . 

If a point p £ O lies outside C we can move it a little closer to the origin 
without entering C. Since it remains outside C, the point does not become part 
of MM, so \ OPT\ is reduced, |MM| remains the same and the ratio \MM\/\OPT\ 
increases, which is impossible. 

Claim 1: All points inside G are in MM. 

It follows from the definition of MM that all points inside C are in MM. 
Notice that this implies that no point p G O can lie inside C. 

Claim 2: Without loss of generality, we may assume that the origin is also 
a median of MM. 

Suppose that the origin is not a median of MM. We consider the case when 
more than fc/2 points of MM have positive y-coordinate; the other cases are 
handled analogously. We set the y-coordinate of the point in MM with smallest 
positive y-coordinate to zero. By assumption, this causes the point to move 
away from at least as many points of MM as it moves toward. Thus, |MM| 
does not decrease. The origin is a median of OPT so \ OPT] does not increase. 
Therefore, the ratio |MM|/| OPTj cannot decrease. Since the ratio cannot increase 
by assumption, it must remain the same. Thus, we have constructed a point set 
achieving |MM|/|OPT| = pk with one fewer point having positive y-coordinate. 
Repeating this process will make some point on the line y = a median. 

Claim 3: No point p A lies inside C . 

Suppose there is a p G A that lies inside C. Moving p away from the origin 
increases MM because p is moved further away from the median of MM. Since 
p ^ OPT, OPT does not increase, although it may decrease. So \MM\/\OPT\ 
increases, which is impossible. This implies that all points inside C are in B and 
that points from A and O lie on the boundary of C. 

Claim 4: Without loss of generality, we may assume that all points p E A 
on C lie in a corner of C . 

Suppose p £ A lies on an edge of C but not in a corner. Let D be the sum of 
the Li distances from p to all points in MM— p. Consider the set Q of all points q 
for which the sum of the Li distances from q to all points in MM—p is at most D. 
The sum of distances is the sum of convex functions so it is also a convex function 
and the set Q is a convex polygon through p. Therefore, we can move p along 
the edge of C on which it lies so that it either moves outside of Q or remains on 
the boundary of Q. In former case, |MM| increases. In the latter, |MM| remains 
the same. In either case, |0P7] stays the same or decreases. If |MM| increases 
and/or \ OPT\ decreases, \MM\/\OPT\ increases which is impossible. If both stay 
the same, we can move p until it reaches a corner of C . For an illustration of 
what the configuration may look like see Figure a). 

Claim 5: Without loss of generality we may assume that all points inOUB 
lie in a corner of C or on the origin. 
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• one or more points from A 
o one or more points from O 
+ one or more points from B 



(a) 



(b) 



Fig. 5. Points of A, O and B (a) after claim 4 and (b) during motion used in 
claim 5. 



We prove the claim by contradiction. Suppose there is a set of points S 
for which the claim is false. Let p G O U i? be a point that does not lie in a 
corner of C or on the origin. Let S(ji) be the points that lie on the axis-parallel 
rectangle through p with corners on C. The set S{p) is illustrated in Figure 
I3b). We move the points in S{p) simultaneously in such a way that they stay 
on an axis-parallel rectangle with corners on C. For example we move all points 
in S{p) with maximal ^-coordinates but not on C upwards by e. We move all 
points in S{p) with maximal y-coordinates and on C upwards while remaining 
on C. Similarly the other points of S{p) move either left, right or down. We 
choose e small enough such that no point from S \ S{p) enters the rectangle 
on which S{p) lies. This move changes \MM\ by some amount Sa and \OPT\ by 
some amount So- However if we move all points in the opposite direction (i.e. 
points with maximal ^-coordinates downwards, etc.) \MM\ and \OPT\ change 
by —Sa and —So respectively. So if Sa/So 7^ Pk, one of these two moves increases 
\MM\/\OPT\, which is impossible. If Sa/So = Pk we keep moving the points 
in the same direction until there is a combinatorial change, i.e. a point from 
S\S{p) enters the rectangle on which S{p) lies, a point in S{p) reaches C, or the 
rectangle collapses into a line. Each combinatorial change decreases the number 
of rectangles on which the points lie, increases the number of points on C, or 
moves points to one of the coordinate axes. Since none of these changes is ever 
undone, we can then repeat this argument until all points of S lie on a corner of 
C or on the origin. 

We can now complete the proof of Theorem ^ Let b denote the number of 
points at the origin. These points are all in _B = OPT n MM since they were 
originally inside C. Let 00,01,02,03 and 00,01,02,03 be the points of MM and 
OPT at the north, east, south and west corners of C respectively. The value 
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of \MM\ is 2X;o<»<j<3«»"j + Y.0<^<3ba^ = '^llo<i<j<3"'^°'J + ~ b) 

which is maximal when each value is equal to \{k — &)/4] or [(fc — 6)/4j. The 
value of I OPT\ is 2 X]o<i<j<3 '^^'^j ~^ ^(^ ~ ^) which is minimal when oq = k — b 
and 01=02 = 03 = 0. The origin must be a median of OPT since none of our 
transformations move a point between quadrants. Thus, if 6 < fc/2, the minimum 
value for | OPT\ occurs when oq — k/2 and oi ~ k/2 ~ b. So if 6 < fc/2 we have 

ffl ^ -hich it follows that ^ < ^4^. This is 

a convex function of b in the interval < 6 < fc/2 whose values are smaller than 
7/4. 

If 6 > fc/2 we have ]^Qpj^ < — '"°b(fc-b')'''' '''' ^ which is maximal when 

b = fc/2 in which case \MM\/\OPT\ = 7/4. Notice that n has to be at least 
llfc/8 for this value to be obtained since we need Qi = k/8 for all i and oq = fc/2 
where MM and OPT can share the points in the north corner of C. For smaller 
values of n we can add extra points to the corners of C until n = llfc/8, so MM 
increases and OPTdecreases. Since \MM/\OPT\ = 7/4 when n = llfc/8 we have 
\MM/\ OPT\ < 7/4 for all values of fc. Therefore the theorem holds. □ 



2.2 Analysis of MClxl 

MC was originally presented as a heuristic algorithm, but we prove that 
MClxl has approximation ratio (2 — 2/k)d in dimension d. Krumke et al. [16] 
used the same ideas to prove that a variant of MM is a (2 — 2/fc)-approximation 
algorithm; their argument also applies to MM. 

Theorem 2. MClxl is a {2 — 2 /k)d- approximation algorithm for minimizing 
the sum of pairwise Li distances in a d-dimensional mesh. 

Proof. Recall that MClxl minimizes the sum of the selected points' shell num- 
bers. Let point V be the center of the shells for the selected allocation and let a 
be the sum of the shell numbers for points of MClxl. First, we bound \MClxl\ 
in terms of a. The total distance from v to each point of MClxl is at most ad 
since a point in shell i is at most id steps from v. Thus, \MClxl\ < (fc — l)^^ 
since this is the distance if all paths are routed through v. 

Now we bound \ OPT\ in terms of a. For this, we use the concept of a star, 
which is a set of points with one identified as its center. The length of a star 
is the total distance between the center and its other points. The smallest star 
with fc points has length at least a since a point distance i from the star's center 
is in the i"^ shell around that center. Thus, the total distance from one point of 
OPT to the others is at least a. Since summing the lengths of stars of OPT with 
each point as the center counts the distance between each pair of points twice, 
|(9PT| > fccr/2 and the lemma follows by combining our bounds. □ 



2.3 Fast Algorithm for fc = 3 

Theorem 3. Let P be a set of n points in the plane. The subset of P of size 3 
with minimum total pairwise Li distance can be found in 0(n log n) time. 
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Proof. Let S = {so, si, S2} be a subset of P. Label the x- and y-coordinates of 
a point s E S with (xq, yti) with < a < 3 and < 6 < 3 so that xq < xi < X2 
and yo < yi < y2- The total pairwise Li distance of S is 2(2:2 — xq) + 2(j/2 — ya)- 
Consider the smallest Steiner star of S, which has center {xi,yi). Its length is 
{x2 — Xq) + (y2 ^ya)- Since the total pairwise distance and length of the smallest 
Steiner star are constant multiples of each other, the subset of size 3 having 
minimum total pairwise distance also has the smallest Steiner star. 

Let c be the center of the smallest Steiner star of 3 points of P. By the 
discussion above, the three points having this Steiner star also have minimum 
total pairwise distance. These points are the three closest points to c or there 
would have been a smaller Steiner star. Therefore, these points correspond to a 
cell on the order-3 Voronoi diagram of P. Since this diagram can be found in 
0{Ti\ogn) time [17], the theorem follows. □ 

3 PTAS for Two Dimensions 

Let 'w{S, T) be the sum of all the distances from points in S to points in T . 
Let WxiS, T) and Wy{S, T) be the sum of x- and y- distances from points in S to 
points in T, respectively. So w{S,T) = w,j:{S,T) + Wy{S,T). Let w{S) = w{S, S), 
Wx{S) = Wx{S, S), and Wy{S) — Wy{S, S). We call w{S) the weight of S. 

Let S = {sq, Si, ... , Sk-i} be a minimum-weight subset of P, where k is an 
integer greater than 1. We label the x- and y-coordinates of a point s G 5* by 
some {xa,yb) with < a < fc and < 6 < A: such that xq < xi < . . . < Xk^i and 
yo < yi < ■ ■ ■ < yk-i- (Note that in general, a 7^ & for a point s = [xa,yb)-) We 
can derive the following equations: Wx{S) ~ (fc — l)(a;A;_i — xo) + (fc — 3)(a;fe-2 — 
xi)+ ... a.nd Wy{S) = {k - l){yk-i - yo) + {k - 3){yk-2 - yi) + ■■■ We 
show that there is a polynomial-time approximation scheme (PTAS), i.e., for 
any fixed positive m = 1/e, there is a polynomial approximation algorithm that 
finds a solution within (1 -I- e) of the optimum. 

The basic idea is similar to the one used by Fekete and Meijer [11] to select a 
set of points maximizing the overall distance: We find (by enumeration) a subdi- 
vision of an optimal solution into m x m rectangular cells Cij , each containing a 
specific number fey of selected points. The points from each cell Cij are selected 
in a way that minimizes the total distance to all other cells except for the m—1 
cells in the same "horizontal" strip or the m — 1 cells in the same "vertical" strip. 
As it turns out, this can be done in a way that the total neglected distance within 
the strips is bounded by a small fraction of the weight of an optimal solution, 
yielding the desired approximation property. See Figure |S1 for the setup. 

For ease of presentation, we assume that fe is a multiple of m and m > 2. 
Approximation algorithms for other values of fc can be constructed in a similar 
fashion. Consider a division of the plane by a set of to-|-1 x-coordinates £,0 < £,1 < 
• • • < £,m- Let Xi := {p = {x,y) \ £,i < x < ^i+i} be the vertical strip between 
coordinates £i and ^i+i. By enumeration of possible values of • ■ • , we may 
assume that each of the m strips Xi contains precisely fe /m points of an optimal 
solution. (A small perturbation does not change optimality or approximation 
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?0 5l S2 



Fig. 6. Dividing the point set into horizontal and vertical strips. 



properties of solutions. Thus, without loss of generality, we assume that no pair 
of points share either x-coordinate or y-coordinate.) 

In a similar manner, assume we know m+1 y-coordinates rjQ < r]i < . . . < rjm 
so that an optimal solution has precisely k/m points in each horizontal strip 
Yi := {p = {x, y)\7]i<y < m+i}- 

Let dj := Xi HYj, and let kij be the number of points in OPT that are 
chosen from dj. Since for all i,j G {1, 2, ... , m}, 

k] = ^ hi = k/m, 

0<l<m 0<l<m 

we may assume by enumeration over the 0{k"^) possible partitions of k/m into 
m pieces that we know all the numbers kij. 

Finally, define the vector Vy := {{2i + 1 — m)k/m, {2j + 1 — m)k/m). Our 
approximation algorithm is as follows: from each cell Cij , choose kij points that 
are minimum in direction V^j, i.e., select points p = (x, y) for which [x{2i + 1 — 
m)k/m,y{2j + 1 — m)k/m) is minimum. For an illustration, see Figure|Z| 

It can be shown that selecting points of Cij this way minimizes the sum of 
x-distances to points not in Xi and the sum of y-distances to points not in Yj . 
Technical details are described in the following. We summarize: 

Theorem 4. The problem of selecting a subset of minimum total Li distance 
for a set of points in allows a PTAS. 



Correctness of the PTAS 

Let MM be the point set selected by the algorithm described in Section 13 
It is clear that MM can be computed in polynomial time. We will proceed by a 
series of lemmas to determine how well w{MA4) approximates w{OPT). In the 
following, we consider the distances involving points from a particular cell Cij. 
Let MMij he the set of kij points that are selected from dj by the heuristic, 
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Fig. 7. Selecting points in cell C12. 



and let OPTij be a set of fc^ points of an optimal solution that are attributed 
to Cij. Let MMi,, OPTi,, MM,j and OPT,^ be the set of k/m points selected 
from Xi and 1^ by the heuristic and an optimal algorithm respectively. Finally 
MM,, := MM\ MM,,, 'MM,j := MM\ MM,,, UFT,, := OPT\ OPT,, and 
'OPT,, := OPT\ OPT,,. 

For the rest of the notation notice that 

w{HEU) = ^ [w,{MM„,MM„) + WyiMM,, ,MM,j)] 

i-J 

+ ^ w^{MM„) + J2 Wy{MM,j). 

i 3 

We first show that the first part is smaller that w{OPT). We then show that 
the second and third part are small fractions of w{HEU). 

Lemma 3. 



w^{MM,j, MM,,) + Wy{MM,j,MM,j) 
< w,{OPT„,OPT„) + Wy{OPT„,'OPT,,). 

Proof. Consider a point p G OPTi, \ MMi,. We will replace it with an arbitrary 
point p' G MMij \ OPTij that was chosen by the heuristic instead of p. Let 
p — p' = h = [hx,hy). When replacing p' in MM by p, we increase the x- 
distance to the ik/m points left of Cij by h^, while decreasing the x-distance to 
(m — i — l)k/m points right of Cij by hx- In the balance, this yields a change 
of {{2i + 1 — m)k/m)hx. Similarly, we get a change of {[2] + 1 — m)k / m)hy for 
the j/-coordinates. Since p' was chosen to minimize the inner product {p' ,Wij) 
we know that the inner product (/i, Vy) > 0, so the overall change of distances 
is positive. 

Performing these replacements for all points in MM \ OPT, we can trans- 
form MM to OPT, while increasing the sum of distances Wx{MMij , MMi,) + 
Wy{MM,j,MM,j) to the sum Wx{OPT,, ,'OPTi,) + Wy{OPT,,,'OPT,j). □ 
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Corollary 1. 

^w^{MM,j,MM,.) + Wy{MM,j,MM,j) < w{OPT). 



In the following two lemmas we show that 

i 

is a small fraction of w{MM). Analogous proofs can be given for 

j 

Lemma 4. 

..... ^ w^(MM) 

^-^ 771 — 2 

0<i<m-l 

Proof. Let Si = ^i+i — ^i. Since 1(777 — i — 1) > m — 2 for < z < 777 — 1, we 
have w^(MM,,) < < 1* ilTLliiiiMj^ 1 for < i < 777 - 1. Since MM has 

ik/m and (777, — i — l)k/m points to the left of and right of ^i^i respectively, 
we have 

ik (m — i — l)k 



.(MM) > J2 



m 

0<i<m~l 



SO 



y w^MMi,) < -^—w^{MM). 



771-2 
0<i<m-l 



□ 



Lemma 5. For 7 = and i = m — 1 we have w^iMMi,) < "'^^^f^"' ■ 

Proof. Without loss of generality assume 7 = 0. Let xq, xi, ■ • • , 2^(A:/m)-i be the 
a;-coordinates of the points po,pi, . . . ,P(k/vi)-i in MMq,. So 

WxiMMo,) = - 1^ (-^^-1 ~ ^0) + - 3^ (^^-2 ~ ^1 
<(^-l)(er-.o)+(^-3)(er-.r)+ ... 
<— (6-2:0) H i^i~xi)+ ... H 

771 777 777 V >" ^ 
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Since — Xj < x — Xj where < j < k/m and x is the a;-coordinate of 
any point in MMq, and since there are (to — l)k/m points in MMq,, we have 
^1 - < i^^MP3,MMo,) so 

w^iMMo,) < —- — V w^{pi,MMo,) 

TO TO — 1 fc ^ — ' 

< V w^ip^JlMo,) 

TO — 1 

= -Wx(MMo.,MMo.) 

TO — 1 

< -^—w,{MM). 

TO — 1 

□ 

Combining the three lemmas, wc get the claimed result and the proof of 
Theorem 2. 

w{MM) = ^WxiMMij,MMi,) + Wy{MMij , MM,j) 
+ ^ w^{MMi,) + ^ Wy{MM,j) 

i j 

< w{OPT) + --l—{w^{MM} + Wy{MM)) 
+ ^—{w,{MM) + Wy{MM)) 

TO — 1 

= w{OPT) + -^—w{MM) + -^—w{MM). 

TO — 2 771 — 1 

So w{MM) (l - ^ - ^) < wiOPT). 

4 Higher-Dimensional Spaces 

Using the same techniques, we also generalize our results to higher dimen- 
sions. We start by describing the performance of MM. 

4.1 (2 — ^) -Approximation 

As in two-dimensional space, MM enumerates over the 0{n'^) possible medi- 
ans. For each median, it constructs a candidate solution of the k closest points. 

Lemma 6. MM is not better than a 2 — l/(2d) approximation. 
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Proof. Wc construct an example based on the cross-polytope in d dimensions, 
i.e., the d-dimensional Li unit baU. Let e > 0. Denote the origin with O and 
the i^^ unit vector with e;. The example has k/2 points at O and O + ei. In 
addition, there are k/ (4d) points at O — (1 — e)ei, O + (2 — e)ei, O ± (1 — e)ei 
for z = 2, . . . , d, and O + ei ± (1 - e)ej for i = 2, . . . , d. MM does best with O 
or O + ei as median, giving a total distance of (2 — l/{2d)) (1 + 0{s)). 

Optimal is the points at O and O + ei, giving a total distance of □ 

Establishing a matching upper bound can be done analogously to Section l2T1 
Lemma|21holds for general dimensions. The rest is based on the following lemma, 
which is a higher-dimensional version of Claim 5 in the proof of Theorem ^ 

Lemma 7. Worst-case arrangements for MM can be assumed to have all points 
at positions (0, . . . , 0) and ±6.^, where Ci is the ith unit vector. 

Sketch of Proof. Consider a worst-case arrangement within the cross-polytope 
centered at the origin with radius 1 . Local moves consist of continuous changes in 
point coordinates, performed in a way that preserves the number of coordinate 
values. This means that to move a point having a coordinate value different from 
0, 1, — 1, then all other points sharing that coordinate value are moved to keep 
the identical coordinates the same, analogous to the proof of Theorem ^ 

Note that under these moves, the functions OPT and MM are locally linear, 
so the ratio of MM and OPT is locally constant, strictly increasing, or strictly 
decreasing. If a move decreases the ratio, the opposite move increases it, contra- 
dicting the assumption that the arrangement is worst-case. 

If the ratio is locally constant during a move, it will continue to be extremal 
until an event occurs, i.e., when the number of coordinate identities between 
points increases, or the number of point coordinates at 0, 1, —1 increase. While 
there are points with coordinates different from 0, 1, —1, there is always a move 
that decreases the total degrees of freedom, until all dn degrees of freedom have 
been eliminated. Thus, we can always reach an arrangement with point coordi- 
nates values from the set {0, 1, —1}. These leaves the origin and the 2d positions 
±ei as only positions within the cross-polytope. □ 

The restricted set of arrangements can be evaluated with symmetry to yield 

Theorem 5. For points lying in d-dimensional space, MM is a 2 — 1/ 2d- approxi- 
mation algorithm, which is tight. 

4.2 PTAS for General Dimensions 

Theorem 6. For any fixed d, the problem of selecting a subset of minimum total 
Li distance for a set of points in allows a PTAS. 

Sketch of Proof. For m ~ 0(1/ e), we subdivide the set of n points with d{m-\- 1) 
axis-aligned hyperplanes, such that (to 4- 1) are normal for each coordinate di- 
rection. Moreover, any set of {m-\- 1) hyperplanes normal to the same coordinate 
axis is assumed to subdivide the optimal solution into k/m equal subsets, called 
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slices. Enumeration of all possible structures of this type yields a total of 
choices of hyperplanes in each coordinate, for a total of n™** possible choices. 
For each choice, we have a total of m'^ cells, each containing between and k 
points; thus, there are 0{m'"^) different distributions of cardinalities to the dif- 
ferent cells. As in the two-dimensional case, each cell picks the assigned number 
of points extremal in its gradient direction. 

It is easily seen that for each coordinate Xj, the above choice minimizes the 
total sum of Xj-distanccs between points not in the same x^-slice. The remain- 
ing technical part (showing that the sum of distances within slices are small 
compared to the distances between different slices) is analogous to the details 
described for the two-dimensional case and omitted. □ 

5 Experiments 

The work discussed so far is motivated by the allocation of a single job. In 
the following, we examine how well our algorithms allocate streams of jobs; now 
the set of free processors available for each job depends on previous allocations. 

To understand the interaction between the quality of an individual allocation 
and the quality of future allocations, we ran a simulation involving pairs of algo- 
rithms. One algorithm, the situation algorithm., places each job. This determines 
the free processors available for the next job. Each allocation decision serves as 
an input to the other algorithm, the decision algorithm. Each entry in Tabled 
represents the average sum of pairwise distances for the decision algorithm with 
processor availability determined by the situation algorithm. 

Our simulation used the algorithms MClxl, MM, MM-|-Inc, and HilbertBF. 
MM-|-Inc uses local improvement on the allocation of MM, replacing an allocated 
processor with an excluded processor that improves average pairwise distance 
until it reaches a local minimum. HilbertBF is the 1-dimensional strategy of 
Leung et al. [18] used on Cplant. The simulation used the LLNL Cray T3D 
trace from the Parallel Workloads Archive [10]. This trace has 21323 jobs run 
on a machine with 256 processors, treated as a 16 x 16 mesh in the simulation. 



Situation 
Algorithm 


Decision Algorithm 


MClxl 


MM 


MM+Inc 


HilbertBF 


MClxl 


5256 


5218 


5207 


5432 


MM 


5323 


5285 


5276 


5531 


MM-fInc 


5319 


5281 


5269 


5495 


HilbertBF 


5090 


5059 


5046 


5207 



Table 1. Average sum of pairwise distances when the decision algorithm makes 
allocations with input provided by the situation algorithm. 



In each row, the algorithms are ranked in the order MM-|-Inc, MM, MClxl, 
and HilbertBF. This is consistent with the worst-case performance bounds; MM 
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is a 7/4-approximation, MClxl is a 4-approximation, and HilbertBF has ap- 
proximation ratio n{N) on an x mesh. 

6 Conclusions 

The algorithmic work described in this paper is one step toward developing 
algorithms for scheduling mesh-connected network-limited multiprocessors. We 
have given provably good algorithms to allocate a single job. The next step is to 
study the allocation of job sequences, a markedly different algorithmic challenge. 

The difference between making a single allocation and a sequence of allo- 
cations is already illustrated by the diagonal entries in Tabled where the free 
processors depend on the same algorithm's previous decisions. These give the 
ranking (from best to worst) HilbertBF, MClxl, MM+Inc, and MM. The lo- 
cally better decisions of MM-|-Inc seem to paint the algorithm into a corner over 
time. Figuresm21 and|31help explain why. When starting on an empty grid, MC 
produces connected rectangular shapes. Locally, these shapes are slightly worse 
than the round shapes produced by MM. but rectangles have better packing 
properties because they avoid small patches of isolated grid nodes. 

We confirmed this behavior over an entire trace using Procsimity [28,29], 
which simulates messages moving through the network. We ran the NASA Ames 
iPSC/860 trace"^ from the Parallel Workloads Archive [10], scaling down the 
number of processors for each job by a factor of 4. This made the trace run 
on a machine with 32 processors, allowing us to find the greedy placement that 
minimizes average pairwise distance at that step. For average job flow time, 
MClxl was best, followed by MM, and then greedy. We did not run MM-|-Inc 
in this simulation. HilbertBF was much worse than all three of the algorithms 
mentioned in part due to difficulties using it on a nonsquare mesh. 

Based on these results and the work of Leung et al. [18], one of the first 
allocators developed and licensed for the partially completed Red Storm su- 
percomputer uses a machine specific space-filling curve and a ID bin-packing 
technique. We expect to have Red Storm implementations of a 3D version of 
MClxl and the greedy heuristic (called MM) analyzed in this paper. 

Thus, the online problem in an iterated scenario is the most interesting open 
problem. We believe that a natural attack may be to consider online packing of 
rectangular shapes of given area. We plan to pursue this in future work. 
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